Using Expectation-Maximization for Reinforcement Learning
نویسندگان
چکیده
We discuss Hinton’s (1989) relative payoff procedure (RPP), a static reinforcement learning algorithm whose foundation is not stochastic gradient ascent. We show circumstances under which applying the RPP is guaranteed to increase the mean return, even though it can make large changes in the values of the parameters. The proof is based on a mapping between the RPP and a form of the expectation-maximization procedure of Dempster, Laird, and Rubin (1977).
منابع مشابه
Cooperative Multi Robot Path Planning using uncertainty Covariance as a feature EECS-545 Final Report
The project was aimed at applying some of the machine learning tools to the problem of multi-robot path planning. We have designed a Reinforcement Learning(RL) based multi agent planner, which maximizes the information gained as well keep the robots well localized. We have modified a 2D laser scan matcher to recover a multimodel distribution using the Expectation Maximization (EM) algorithm to ...
متن کاملInverse Reinforcement Learning Under Noisy Observations (Extended Abstract)
We consider the problem of performing inverse reinforcement learning when the trajectory of the expert is not perfectly observed by the learner. Instead, noisy observations of the trajectory are available. We generalize the previous method of expectation-maximization for inverse reinforcement learning, which allows the trajectory of the expert to be partially hidden from the learner, to incorpo...
متن کاملOnline Expectation Maximization for Reinforcement Learning in POMDPs
We present online nested expectation maximization for model-free reinforcement learning in a POMDP. The algorithm evaluates the policy only in the current learning episode, discarding the episode after the evaluation and memorizing the sufficient statistic, from which the policy is computed in closedform. As a result, the online algorithm has a time complexity O ( n ) and a memory complexity O(...
متن کاملA Biologically Plausible 3-factor Learning Rule for Expectation Maximization in Reinforcement Learning and Decision Making
One of the most frequent problems in both decision making and reinforcement learning (RL) is expectation maximization involving functionals such as reward or utility. Generally, these problems consist of computing the optimal solution of a density function. Instead of trying to find this exact solution, a common approach is to approximate it through a learning process. In this work we propose a...
متن کاملExpectation Maximization for Weakly Labeled Data
We call data weakly labeled if it has no exact label but rather a numerical indication of correctness of the label “guessed” by the learning algorithm a situation commonly encountered in problems of reinforcement learning. The term emphasizes similarities of our approach to the known techniques of solving unsupervised and transductive problems. In this paper we present an on-line algorithm that...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neural Computation
دوره 9 شماره
صفحات -
تاریخ انتشار 1997